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Abstract 

Background: Nellore cattle play an important role in beef production in tropical systems and there is great interest 
in determining if genomic selection can contribute to accelerate genetic improvement of production and fertility in 
this breed. We present the first results of the implementation of genomic prediction in a Bos indicus (Nellore) 
population. 

Methods: Influential bulls were genotyped with the lllumina Bovine HD chip in order to assess genomic predictive 
ability for weight and carcass traits, gestation length, scrotal circumference and two selection indices. 685 samples 
and 320 238 single nucleotide polymorphisms (SNPs) were used in the analyses. A forward-prediction scheme was 
adopted to predict the genomic breeding values (DGV). In the training step, the estimated breeding values (EBV) 
of bulls were deregressed (dEBV) and used as pseudo-phenotypes to estimate marker effects using four methods: 
genomic BLUP with or without a residual polygenic effect (GBLUP20 and GBLUPO, respectively), a mixture model 
(Bayes C) and Bayesian LASSO (BLASSO). Empirical accuracies of the resulting genomic predictions were assessed 
based on the correlation between DGV and dEBV for the testing group. 

Results: Accuracies of genomic predictions ranged from 0.17 (navel at weaning) to 0.74 (finishing precocity). Across 
traits, Bayesian regression models (Bayes C and BLASSO) were more accurate than GBLUP. The average empirical 
accuracies were 0.39 (GBLUPO), 0.40 (GBLUP20) and 0.44 (Bayes C and BLASSO). Bayes C and BLASSO tended to 
produce deflated predictions (i.e. slope of the regression of dEBV on DGV greater than 1). Further analyses 
suggested that higher-than-expected accuracies were observed for traits for which EBV means differed significantly 
between two breeding subgroups that were identified in a principal component analysis based on genomic 
relationships. 

Conclusions: Bayesian regression models are of interest for future applications of genomic selection in this 
population, but further improvements are needed to reduce deflation of their predictions. Recurrent updates of the 
training population would be required to enable accurate prediction of the genetic merit of young animals. The 
technical feasibility of applying genomic prediction in a Bos indicus (Nellore) population was demonstrated. Further 
research is needed to permit cost-effective selection decisions using genomic information. 
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Background 

The possibility of accurately predicting the genetic merit 
of individuals based on their genotypes analyzed by dense 
single nucleotide polymorphism (SNP) marker panels, a 
process known as genomic selection (GS) [1,2], is revolu- 
tionizing the design and implementation of livestock 
breeding programs especially for dairy cattle. Schaeffer [3] 
highlighted the potential benefits of this strategy for dairy 
cattle in terms of reducing generation intervals, increasing 
prediction accuracies and selection intensities, reducing 
breeding organization costs and making it feasible to 
perform genetic evaluations of difficult-to-measure traits. 

The rationale behind genomic selection in livestock is 
that, given a marker density high enough to cover the 
entire genome, most of the quantitative trait loci (QTL) 
will be in high linkage disequilibrium (LD) with some 
of the markers. Therefore, the sum of all SNP effects 
(direct genomic value, DGV) will be a good predictor 
of the genetic merit of selection candidates and will 
enable selection decisions as soon as the genomic 
information of those individuals is available [4] . 

Thanks to the sequencing of the bovine genome [5] 
and the availability of dense panels of SNP markers, GS 
has moved from simulation approaches to practical ap- 
plication in the last years. The first successful application 
of GS was in dairy cattle (Holstein) [4,6] and motivated 
studies on GS in other breeds and populations [7-9]. 

Although several previous reports compared statistical 
methods applied to GS in cattle using the Illumina Bovine 
50 K chip (Illumina, San Diego, CA, USA) [7,10,11], only 
a few studies have carried out similar comparisons using 
high-density panels, such as the Illumina Bovine HD chip, 
which contains more than 700 000 SNPs [12]. In addition, 
most of the studies in this field were carried out using data 
from Bos taurus breeds. While previous studies have in- 
vestigated the application of GS in purebred and compos- 
ite populations of Bos indicus (Brahman) [13,14], the 
performance of GS in many other Bos indicus populations 
is unknown. 

Nellore cattle are the primary breed used in beef pro- 
duction in tropical systems. Thus, it is expected that 
genome-enhanced predictions could considerably con- 
tribute to improve the efficiency of breeding programs 
in such systems. Brazil has a large number of well- 
recorded Nellore animals obtained from several genetic 
evaluation initiatives [15] that have achieved significant 
genetic progress for growth traits in the last two decades 
through conventional selection, although progress for 
reproduction, meat quality and feed efficiency traits has 
been less significant during the same period [16]. 

Our aim was to create the scientific basis for the appli- 
cation of GS to Nellore cattle, by comparing genomic 
prediction results obtained with four different prediction 
methods on 15 traits of economic relevance in this breed. 



Methods 

Data 

Phenotypic and genotypic data were available for 691 in- 
fluential Nellore bulls. Genotypes were generated with 
the Illumina Bovine HD chip (Illumina, San Diego, CA, 
USA) and only autosomal SNPs with a GenCall (GC) 
score higher than 0.70 were considered for further ana- 
lyses. Fifty-four SNP pairs that had the same map coor- 
dinates were excluded from the dataset. Quality control 
of genotypes was carried out through an iterative process 
using the following SNP selection criteria: call rate (CR) 
higher than 0.98, minor allele frequency (MAF) higher 
than 0.02 and p-value for Hardy- Weinberg equilibrium test 
(HWE) higher than 10"^. The SNPs that met these criteria 
were further screened to interrogate their linkage disequi- 
librium with syntenic SNPs located within a window of 
100 neighboring markers, resulting in only one marker 
from each pair of highly correlated SNPs (r^> 0.995) 
remaining in the SNP dataset. Finally, samples showing CR 
lower than 0.90 were excluded from the analysis. The 
process was repeated until no further SNPs or samples 
were excluded, which resulted in a final dataset of 685 bulls 
with 320 238 SNPs. 

Phenotypes were provided by the DeltaGen genetic 
evaluation program, a commercial beef cattle operation 
managed as an alliance of breeders distributed across 12 
Brazilian states [17]. The estimated breeding values 
(EBV) from routine genetic evaluations were deregressed 
and used as dependent variables to estimate SNP effects 
for 15 traits of economic relevance. These traits included 
weight and carcass traits, scrotal circumference, gesta- 
tion length and two selection indexes [See Additional file 1 
for detailed trait definitions]. The deregressed proofs 
(dEBV), as well as their associated reliabilities, were ob- 
tained according to the procedure proposed by [18], which 
removed parent average effects and also accounted for 
heterogeneous variances [9]. 

The genotyped individuals included 65 influential 
older bulls born between 1965 and 1990, while the 
remaining genotyped animals were younger. The dataset 
comprised up to four generations of genotyped animals, 
including 292 son-sire pairs, 139 grandson-grandsire 
pairs and 51 paternal half-sib families (average size = 4.7). 
[See Additional file 2 for more information about the age 
structure of the genotyped animals]. 

Genomic prediction design 

For each individual trait, a forward prediction scheme 
was adopted, which splits the dataset into a training 
(reference) population, that included bulls with EBV ac- 
curacies greater than 0.50 in 2007, and a testing popu- 
lation that included bulls that did not have accurate 
EBV in 2007 but had EBV accuracies greater than 0.50 
in 2011. 
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The sizes of the training and testing datasets differed 
between traits (Table 1). Most traits were moderately 
heritable, with heritabilities (h^) ranging from 0.25 (score 
for carcass conformation and finishing precocity at 
weaning) to 0.49 (gestation length), with an average of 
about 0.30 (Table 1). Such heritability estimates were 
based on REML estimates of variance components, ob- 
tained using the same database from which the EBV 
employed in this study were obtained. For all traits, aver- 
age EBV accuracies were greater than 0.80 and 0.74 in 
the training and testing sets, respectively. 

In our study, model training was carried out using dEBV 
based on the 2007 genetic evaluation (dEBV2oo7)> while 
dEBV based on the 2011 genetic evaluation (dEBV2oii) 
were used for validation purposes. Using dEBV2oo7 for 
model training ensured that information of own perform- 
ance (and/or progeny records) of the testing animals did 
not contribute to the dEBV of the training set, thus 
preventing overlapping information between training and 
testing sets, which could inflate the estimates of predictive 
ability of GS [19]. 

Because the dataset included many pairs of closely 
related animals, the forward prediction scheme resulted 
in many testing animals having close relatives in the 



training set. The pattern of relationships between ani- 
mals in the training and validation sets was consistent 
across traits [See Additional file 2]. 

Statistical methods 

The following statistical methods were used in order to 
estimate SNP effects and direct genomic values (DGV): 
(i) best linear unbiased prediction (BLUP) using a gen- 
omic relationship matrix (GBLUP), (ii) Bayesian regres- 
sion using a mixture model (Bayes C) and (iii) Bayesian 
LASSO (BLASSO). All methods only accounted for the 
allele substitution (additive) effects of the markers, i.e. 
apart from an overall mean, no other effects (environ- 
mental or genetic) were included in the models. 
GBLUP model can be described as: 

y = l„(i + Zg + e, (1) 

where y is the vector of dEBV for the respective trait, [i is 
the location parameter common to all observations. In is 
a vector of I's, Z is the incidence matrix relating genomic 
breeding values to y, g is the vector of genomic breeding 
values and e is the vector of random residual terms. It was 
assumed that g ~ N (0,G*a^g) and e ~ N (0,Ra^e)> where 



Table 1 Summary statistics related to the estimated breeding values (EBV) of Bos indicus (Nellore) bulls included in 
training and testing sets for 15 traits under forward prediction^ 



Trait^ 




Training set 




Testing set 






Mean EBV (SD)"^ 


Mean accuracy (SD)^ 




Mean EBV (SD)"^ 


Mean accuracy (SD)^ 


WG 


0.26 


494 


1.60 (5.57) 


0.86 (0.12) 


187 


4.11 (5.17) 


0.80 (0.11) 


Cw 


0.25 


472 


0.10 (0.31) 


0.85 (0.12) 


185 


0.19 (0.35) 


0.79 (0.12) 


Pw 


0.25 


472 


-0.03 (0.42) 


0.85 (0.12) 


184 


0.21 (0.42) 


0.79 (0.12) 


Mw 


0.26 


473 


-0.02 (0.40) 


0.85 (0.12) 


185 


0.20 (0.41) 


0.80 (0.11) 


Nw 


0.27 


468 


0.02 (0.27) 


0.85 (0.12) 


188 


0.06 (0.23) 


0.80 (0.11) 


PWG 


0.33 


473 


0.66 (7.58) 


0.85 (0.12) 


115 


2.83 (7.65) 


0.81 (0.10) 


Cy 


0.31 


454 


0.13 (0.36) 


0.84 (0.13) 


118 


0.29 (0.40) 


0.80 (0.11) 


Py 


0.31 


455 


-0.06 (0.55) 


0.83 (0.13) 


117 


0.24 (0.53) 


0.80 (0.11) 


My 


0.30 


448 


-0.05 (0.51) 


0.84 (0.12) 


121 


0.25 (0.50) 


0.79 (0.11) 


Ny 


0.30 


443 


0.03 (0.30) 


0.84 (0.13) 


122 


0.07 (0.26) 


0.79 (0.11) 


SCaw 


0.40 


446 


-0.22 (1.21) 


0.81 (0.14) 


115 


-0.15 (1.15) 


0.75 (0.12) 


BW 


0.37 


457 


0.40 (1.35) 


0.86 (0.11) 


189 


0.15 (1.25) 


0.83(0.11) 


GL 


0.49 


307 


0.17 (3.25) 


0.88 (0.10) 


138 


-0.77 (4.04) 


0.88 (0.10) 


Wl 




479 


2.74 (13.03) 


0.85 (0.12) 


185 


9.83 (13.10) 


0.80 (0.11) 


Fl 




465 


0.86 (12.13) 


0.84 (0.12) 


130 


8.17 (11.94) 


0.77 (0.13) 



^Training set composed of bulls with accurate EBV in 2007 and testing set composed of remaining bulls with accurate EBV In 201 1 but not In 2007; the summary 
statistics were obtained considering the EBV obtained In either 2007 (training set) or 201 1 (testing set); EBV were obtained with BLUP animal models; ^WG = weight gain 
from birth to weaning (about 205 days of age); Cw, Pw, Mw, Nw = visual scores recorded at weaning for carcass conformation, carcass finishing precocity, muscling and 
navel, respectively; PWG = weight gain from weaning to yearling (at 550 days of age); Cy, Py, My, Ny = visual scores recorded at yearling for carcass conformation, 
carcass finishing precocity, muscling and navel, respectively; SCaw = scrotal circumference adjusted for age and weight; BW = birth weight; GL = gestation length; 
Wl = weaning Index, composed of traits evaluated at weaning; Fl = final Index, composed of traits evaluated at weaning and yearling (Fl) [See Additional file 1 for more 
details]; ^ N = sample size; "^Mean EBV(SD) = average (standard deviation) of estimated breeding values (EBV); ^Mean accuracy (SD) = average (standard deviation) of 
EBVs' accuracies. 
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is a combined relationship matrix and R is a diag- 
onal matrix, whose elements account for the differences 
in the reliabilities of the observations in y, similarly as 
in [20]. The diagonal elements of R (Ru) were obtained 
as Rii = (l-r/^)/(r/^), where r^^ is the reliability associated 
with the i* dEBV, obtained following [18]. 

The matrix is a combined relationship matrix, 
computed as G* = (l-w)G + wA, where G is the genomic 
relationship matrix and A is the regular numerator rela- 
tionship matrix, both of order equal to the number of 
genotyped bulls. G was defined as G = MM7Z2pi(l-pi), 
in which M is the incidence matrix of marker scores 
whose elements in the i* column are 0-2pi, l-2pi and 2- 
2pi, depending on whether the animals genotype was 11, 
12 or 22, respectively, and pi is the allele frequency of 
allele 2 at the i^^ marker [20]. 

In the computation of the genomic relationship matrix 
G*, attributing a weight (w) for pedigree-based relation- 
ships is equivalent to fitting residual polygenic effects 
that are not captured by the markers [21]. After testing 
different values for w (ranging from 0 to 0.40), Gao et al. 
[21] reported that w = 0.20 provided the best comprom- 
ise in terms of reliability and scale of DGV. Since our 
aim was to investigate the benefit of this strategy, GBLUP 
predictions were obtained setting w = 0 or w = 0.20, here- 
after referred to as GBLUPO and GBLUP20, respectively. 

Theoretically, allele frequencies from the unselected 
base population should be used to construct G [20], 
which could be estimated after using linear regression to 
predict gene content (number of copies of a particular 
allele in a genotype of an individual) of non-genotyped 
ancestors, based on the available information of geno- 
types and pedigree [22]. However, there is some evidence 
that similar accuracies of prediction are obtained using 
either base population or current allele frequencies 
[20,23]. Hence, in this study, G was constructed using 
current allele frequencies (computed considering all ge- 
notyped animals). The GBLUP method was imple- 
mented using the gebv software described in [24]. This 
formulation of the GBLUP method is equivalent to as- 
suming a normal distribution of SNP effects with con- 
stant variance across SNPs [25]. 

The Bayes C (BayesC) method consisted of fitting a 
mixture model for SNP effects using the same model 
equation as in (1), in which y, (i, z, g, and e were 
defined as before, but the elements of vector g were 

N 

calculated for each animal as ^^^{ziaili), where Zi is the 

i=l 

genotype of the i^^ marker, coded as the number of cop- 
ies of the reference allele, Ui is the effect of marker /, 
and // is an indicator variable that is equal to 1 if the i^^ 
marker has a non-zero effect on the trait and 0 
otherwise. 



Model parameters were estimated within a Bayesian 
framework. It was assumed that ai-N(0, o^J and e ~ N 
(0,Ra^e)- Scaled inverse chi-squared distributions, with v 
degrees of freedom and scale parameter S were assumed 
for o^a and Unlike the Bayes B method [2], this 
mixture model assumes that SNP marker effects are 
sampled from a single (normal) distribution, instead of 
estimating marker-specific variances. An arbitrarily small 
value of 4 was assumed for v, while the scale parameters 
were derived according to [26]. R was defined as de- 
scribed before. A binomial distribution with probability 
n was assumed for Ii and an informative beta distribu- 
tion (a = l.dS, p = l.dlO) was assigned for n (implying 
that this parameter was kept fixed around 0.01). This 
method was very similar to that proposed in [26], except 
that n was assumed to be known, as in [9] . 

The SNP effects were estimated using the Gibbs sam- 
pling algorithm implemented in the GS3 software [27]. 
A single chain with a length of 100 000 iterations was 
used. The burn-in period was 20 000 iterations and the 
thinning interval was 100 iterations. 

The model for Bayesian LASSO (BLASSO) was similar 
to the one in equation (1), except for the assumption 
about SNP marker effects. This implementation can be 
understood as a linear mixed model assuming an expo- 
nential prior distribution for variances of marker effects. 

Originally, the LASSO procedure [28] was a statistical 
method that combined both variable selection and 
shrinkage. Legarra et al. [11] proposed an alternative 
Bayesian implementation of this method, which we used 
here. Based on the parameterization proposed by these 
authors, the prior for individual SNP effects (ai) can be 
represented by: 

V{ai\T^)-N (0,7^2) andP(ri2|A) = {X^ /2)exp{-X^\T,^\). 

This parameterization implies that individual variances 
for each SNP (i.e. Xi^) are estimated, conditional on a 
regularization parameter A, which was estimated by 
using a prior gamma distribution bounded between 0 
and 10^. Flat priors were assumed for o^^ and o^g and dif- 
ferences in reliabilities of dEBV were accounted for via 
the matrix R, as for the other methods. A single chain 
with a length of 100 000 iterations was generated using 
GS3 software. The burn-in period was 20 000 iterations 
and the thinning interval was 100 iterations. 

The programs used to compute genomic predictions 
handle missing markers internally. In GS3, missing calls 
for a given marker are set to the population mean for 
the respective marker, while in the gebv software missing 
genotypes are inferred using a pedigree-based algorithm. 
Due to the low frequency of missing genotypes (0.25%) 
the effects of different imputation procedures are ex- 
pected to be negligible, as already reported by [29] . 
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Comparison criteria 

The four statistical methods used to derive DGV were 
evaluated based on comparison of DGV with dEBV2oii of 
animals from the testing set using the following statistics: 

(i) Pearsons correlation between DGV and dEBV2oii> 
divided by the average accuracy of dEBV2oii> was 
computed as the empirical accuracy of prediction 
(rxBv,DGv)- This quantity can be used as a proxy for 
the correlation of the DGV with the true breeding 
value [4], which is why it is abbreviated as 
"rxBVDGv"- The average accuracy of dEBV2oii was 
computed as the average of the dEBV accuracies 
calculated according to [18]. 

(ii) the slope of the regression of dEBV2oii on DGV for 
animals in the testing set (bldEBv,DGv) was evaluated 
to measure the degree of inflation/deflation of 
genomic predictions, i.e. the scale of the DGV 
compared to that of dEBV. Estimates of bldEBv,DGv 
close to 1 are indicative of predictions that are on 
scale similar to that of the dEBV. 

(iii) the mean squared error of prediction (MSE) 
between DGV and dEBV of animals in the testing 
set was used as a measure of the overall fit of each 
model to the data. Larger estimates of rxEv^DGv are 
indicative of more reliable predictions and a lower 
MSE is associated with a better overall fit, including 
scale. 

Alternative validation designs 

In addition to the forward prediction scheme (FORW), 
two alternative validation strategies were tested for 
GBLUP20 in order to investigate the impact of the genetic 
relationship between training and testing sets on the 
accuracy of genomic predictions in this population. These 
strategies were based on 5 -fold cross-validation that either 
separated animals in five groups of similar size at random 
(RAND) or based on minimizing genetic relationships be- 
tween groups (DIST), For DIST, a k- means algorithm [30] 
was applied, with the distance matrix built based on the 
genomic relationships among genotyped animals, similar 
to [9]. In the case of RAND and DIST, the dEBV gener- 
ated from the 2011 genetic evaluations were used for both 
the training and testing steps and the average rxBVDGv 
(calculated using the five folds) was used as a proxy for 
the empirical accuracy of the DGV. 

Impact of relatedness with training set on the accuracy of 
individual DGV 

In order to investigate the extent to which individual 
accuracies of the DGV of animals in the testing set were 
influenced by their relatedness with individuals from the 
training set, under the forward prediction design, differ- 
ent measures of its genomic relatedness with animals in 



the training set were calculated for each animal in the 
testing set, based on the genomic relationship matrix 
(G) used in GBLUP, similar to [31]. The maximum rela- 
tionship (maxr) and the average of the top 5 (aveS), 10 
(avelO), 20 (ave20) and 50 (aveSO) relationships between 
each testing animal and all animals in the training set 
were calculated. Since GBLUP allowed the calculation of 
individual DGV accuracies based on elements of the 
inverse of the coefficient matrix (hereafter, estimated 
accuracies, or rPEV), the correlations of rPEV with the 
different measures of relatedness with the training set 
were determined. 

Finally, the empirical accuracies and estimated accur- 
acies (averaged across animals in the testing set) were 
compared with the analytical expectation for accuracy of 
genomic predictions, calculated according to a formula 
proposed by [32] (i.e. Equation 1 in that study). This for- 
mula predicts the expected accuracy for an animal with- 
out phenotypic information and without close relatives 
in the training set, as a function of the number of ani- 
mals in the training set, the heritability of pseudo- 
phenotypes and effective number of chromosome seg- 
ments (Me), which was approximated using estimates of 
genome size (L) and effective population size (Ne), i.e. 
Me = 2NeL/ln(4NeL). For such calculations, a 30 Morgan 
genome was assumed, the average reliability of the ani- 
mals in the training set were considered as the heritability 
of pseudo-phenotypes, and markers were assumed to cap- 
ture 80% of the genetic variance for all traits. A value of 
120 was adopted for Ne, similar to the estimate obtained 
by [33] for the population used in this study. 

Results 

Minor allele frequency and linkage disequilibrium 

After quality control of the genotyping data (QC), the 
average (SD) minor allele frequency was 0.226 (0.144) 
and the average (median) linkage disequilibrium (r^) be- 
tween pairs of adjacent markers was 0.293 (0.164). 

Accuracy of genomic predictions 

Empirical accuracies of genomic predictions (rxBVDGv) 
ranged from 0.17 (navel at weaning) to 0.74 (carcass fin- 
ishing precocity at yearling). The average empirical ac- 
curacy across traits was 0.39 and 0.40 for GBLUPO and 
GBLUP20, respectively, and 0.44 for both BayesC and 
BLASSO (Table 2). For traits measured in both periods, 
empirical accuracies were from 18% to 61% higher at 
yearling than at weaning. 

For most traits, GBLUP20 resulted in slightly greater 
accuracies than GBLUPO, although this advantage was 
greater (12%) for gestation length, while for conformation 
at weaning, GBLUPO was 13% more accurate than 
GBLUP20 (Table 2). In general, empirical accuracies of 
BayesC were very similar to those of BLASSO and 
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Table 2 Empirical accuracies and inflation of genomic predictions obtained for 15 traits of Bos indicus (Nellore) cattle 
based on different methods 



r(TBV,DGV)^ b1 (dEBV^DGV)^ 



Trait^ 


GBLUPO 


GBLUP20 


BayesC 


BLASSO 


GBLUPO 


GBLUP20 


BayesC 


BLASSO 


WG 


0.28 


0.27 


0.37 


0.37 


0.79 


0.85 


1.45 


1.39 


Cw 


0.21 


0.18 


0.22 


0.23 


0.85 


0.88 


1.12 


1.10 


Pw 


0.43 


0.45 


049 


049 


1.08 


1.12 


1.37 


1.35 


Mw 


0.43 


0.44 


049 


049 


1.09 


1.14 


1.41 


1.39 


Nw 


0.17 


0.17 


0.20 


0.19 


0.75 


0.85 


1.01 


0.99 


PWG 


0.53 


0.56 


0.50 


0.51 


0.92 


1.06 


1.47 


1.43 


cy 


0.29 


0.30 


0.29 


0.29 


0.98 


1.14 


1.30 


1.26 


Py 


0.70 


0.72 


0.74 


0.74 


1.19 


1.24 


1.39 


1.37 


My 


0.68 


0.69 


0.69 


0.69 


1.13 


1.22 


1.32 


1.30 


Ny 


0.20 


0.20 


0.23 


0.24 


0.94 


1.05 


1.19 


1.19 


SCaw 


0.68 


0.71 


0.72 


0.72 


1.27 


1.44 


1.68 


1.65 


BW 


0.24 


0.24 


0.30 


0.30 


0.57 


0.70 


0.94 


0.91 


GL 


0.22 


0.24 


0.36 


0.36 


0.90 


1.09 


2.35 


2.12 


Wl 


0.30 


0.30 


0.39 


0.39 


0.87 


0.93 


1.39 


1.36 


Fl 


0.49 


0.51 


0.55 


0.54 


1.01 


1.11 


1.40 


1.37 



^Accuracies measured as the Pearson's correlation between direct genomic values (DGV) and deregressed EBV (dEBV) of the bulls in the testing set, r(dEBV,DGV), 
divided by the average accuracy of dEBV in the testing set; ^Inflation of genomic predictions measured by the slope of the regression of dEBV on DGV, b1(dEBV, 
DGV); The estimates of empirical accuracies and inflation refer to the forward prediction design; ^WG = weight gain from birth to weaning (about 205 days of age); 
Cw, Pw, Mw, Nw = visual scores taken at weaning for carcass conformation, finishing precocity, muscling and navel, respectively; PWG = weight gain from weaning 
to yearling (about 550 days of age); Cy, Py, My, Ny = visual scores taken at yearling for carcass conformation, finishing precocity, muscling and navel, respectively; 
SCaw = scrotal circumference adjusted for age and weight; BW = birth weight; GL = gestation length; Wl = weaning index, composed by traits evaluated at 
weaning; Fl = final index, composed by traits evaluated at weaning and yearling (Fl) [See Additional file 1 for more details]. 



superior to those achieved with both implementations of 
GBLUP. The largest advantage of Bayesian regression 
methods over GBLUP20 in terms of empirical accuracy 
was obtained for gestation length (+48%), weight gain 
from birth to weaning (+35%), conformation at weaning 
(+25%) and birth weight (+25%). Conversely, GBLUP20 
was more accurate than Bayesian regressions for weight 
gain from weaning to yearling (+9.5%) and for conform- 
ation at yearling (+4.5%) (Table 2). 

Scale of genomic predictions and mean squared 
prediction error (MSE) 

The slope of the regression of dEBV on DGV (bldEBv, 
DGv) was expected to be close to 1, which would indi- 
cate that genomic predictions are on a similar scale as 
the deregressed EBV, i.e. not inflated or deflated. In gen- 
eral, both GBLUPO and GBLUP20 outperformed the 
Bayesian regression methods in terms of scale, i.e., for 
most traits, predictions of DGV obtained with both 
BayesC and BLASSO were deflated (Table 2). Predictions 
from GBLUP20 tended to be slightly deflated, while 
those from GBLUPO tended to be slightly inflated 
(Table 2). When averaged across traits, the slope of the 
regression of dEBV on DGV was equal to 0.96, 1.05, 
1.39 and 1.35 for GBLUPO, GBLUP20, BayesC and 
BLASSO, respectively. However, for birth weight and 



navel at weaning, BayesC and BLASSO clearly outper- 
formed GBLUP in terms of scale. 

For most traits, the overall fit of the model to the data, 
judged by the mean squared prediction error (MSE), fa- 
vored both GBLUP methods over the Bayesian regres- 
sion methods (Table 3). However, for three of the traits 
(scrotal circumference, birth weight and gestation 
length), lower estimates of MSE were obtained for Bayes 
C and Bayesian LASSO (Table 3). 

Individual accuracy of DGV 

For most traits, the average accuracy of the DGV (rPEV) 
was around 0.46, ranging from 0.22 to 0.61 (Table 4). 
Correlations between accuracies estimated for individ- 
uals in the testing set and their relatedness with animals 
in the training set were strong. The best predictor for 
this association was the average of the top five relation- 
ships between a testing animal and animals in the train- 
ing set (ave5), for which the average correlation with 
rPEV across traits was 0.81. The maximum relationship 
between a testing animal and animals in the training set 
(maxr) also exhibited a strong association with rPEV 
(average correlation of 0.78). Across all animals in the 
testing set, the average maxr and ave5 was equal to 0.35 
and 0.20, respectively (Table 4). 
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Table 3 Mean squared error (MSE) of genomic predictions 
for 15 traits^ of Bos indicus (Nellore) cattle based on 
different prediction methods 

MSE' 



Trait^ 


GBLUPO 


GBLUP20 


BayesC 


BLASSO 


WG 


164.1 


165.0 


212.6 


209.6 


Cw 


0.7 


0.70 


0.8 


0.9 


Pw 


0.9 


0.9 


1.2 


1.3 


Mw 


0.8 


0.8 


1.1 


1.2 


Nw 


0.3 


0.3 


0.5 


0.7 


PWG 


194.4 


192.6 


291.0 


335.7 


cy 


0.8 


0.8 


1.2 


1.3 


Py 


0.9 


0.9 


2.0 


2.7 


My 


0.8 


0.8 


1.4 


1.8 


Ny 


0.3 


0.3 


0.6 


0.8 


SCaw 


6.1 


6.0 


6.0 


5.8 


BW 


5.5 


5.4 


5.3 


5.3 


GL 


48.9 


48.4 


47.6 


47.5 


Wl 


1029.4 


1031.3 


1409.1 


1602.1 


Fl 


708.5 


704.1 


1415.0 


1568.9 



^MSE: mean squared prediction error. MSE = l^(DGi/-dEBi/)^; this statistic 
was calculated considering the bulls in the testing set, under the forward 
prediction design; ^ WG = weight gain from birth to weaning (about 205 days of 
age); Cw, Pw, Mw, Nw = visual scores taken at weaning for carcass conformation, 
finishing precocity, muscling and navel, respectively; PWG = weight gain from 
weaning to yearling (about 550 days of age); Cy, Py, My, Ny = visual scores taken 
at yearling for carcass conformation, finishing precocity, muscling and navel, 
respectively; SCaw = scrotal circumference adjusted for age and weight; 
BW = birth weight; GL = gestation length; Wl: weaning index, composed by traits 
evaluated at weaning; Fl = final index, composed of traits evaluated at weaning 
and yearling (Fl) [See Additional file 1 for more details]. 



Expected accuracies 

When compared across traits, the mean (SD) of ex- 
pected accuracies based on Daetwyler s formula [32] was 
equal to 0.49 (0.03). In general, although the average em- 
pirical accuracies matched their expectations well, values 
higher than expected were observed for some traits, 
notably for carcass finishing precocity and muscling 
evaluated at yearling, as well as for scrotal circumference 
(Figure 1). In contrast, for conformation at weaning and 
the navel traits, empirical accuracies were at least 50% 
lower than their expected values. 

Accuracy of genomic predictions with different validation 
strategies 

Across traits, empirical accuracies were on average 41% 
smaller for DIST than for the RAND strategy (Figure 2). 
The extent of relatedness between testing and training 
animals was evaluated using statistics similar to maxr 
and aveS (described previously). For this, both maxr and 
aveS were averaged across the testing animals of each 
fold and a pooled average was calculated based on the 
averages of the five folds. 



The pooled averages of maxr and aveS were 0.37 and 
0.24, respectively, under the RAND strategy and were 
thus slightly greater than those found for the forward 
prediction strategy. For DIST, pooled averages of maxr 
and aveS were about 2-fold lower (0.20 and 0.12, re- 
spectively) than for RAND and the forward prediction 
strategy. 

The empirical accuracies obtained for RAND and 
DIST cannot be compared directly with those of the for- 
ward prediction scheme, since dEBV from the same gen- 
etic evaluation (dEBV2oii) were used for both training 
and testing animals for RAND and DIST, which is 
expected to bias the predictive abilities upwards. 

Considering all traits, the average proportion of ani- 
mals in the training set was about 83% for both RAND 
and DIST, although there was greater variation in fold 
sizes for DIST. The average size of the training set was 
slightly smaller for the forward prediction scheme, which 
could also contribute to slightly smaller empirical accur- 
acies in this strategy. 

Discussion 

Linkage disequilibrium in Bos indicus (Nellore) cattle 

The pattern of linkage disequilibrium decay in indicine 
populations differs from that observed in taurine popula- 
tions [34] and beef cattle have a lower level of LD at the 
same distance than dairy cattle [35]. The average LD 
between adjacent markers obtained with the Bovine HD 
panel in Nellore cattle was similar to the values obtained 
in Holstein populations with 50 k chips [4,36]. This level 
of LD is sufficient to achieve accurate genomic predic- 
tions in Bos indicus (Nellore) cattle [37], provided 
enough phenotypic information is used to estimate 
marker effects. 

Genomic prediction methods 

Based on the empirical accuracies of prediction, BayesC 
and BLASSO outperformed the two alternative imple- 
mentations of GBLUP, with few exceptions. For most 
traits, GBLUP predictions had smaller MSE and a scale 
more compatible with that of the deregressed EBV used 
for validation, when compared to the Bayesian regres- 
sion methods. 

Simulation studies have suggested the superiority of 
methods based on some sort of variable selection over 
GBLUP [2,38-40]. This advantage has not been con- 
firmed in many previous studies that compared different 
methods using real data. In studies using real data, 
GBLUP performed comparably or better than variable 
selection methods [4,7,10,11], although there is evidence 
that substantially higher accuracies can be achieved 
using variable selection methods for traits that are 
known to be affected by genes of moderate-to-large 
effects (e.g. traits affected by DG ATI, [6,11]). 
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Table 4 Summary statistics for the accuracy of individual DGV for testing set animals and its association to relatedness 
with the training set for 15 traits^ of Bos indicus (Nellore) cattle 

rPEV^ Correlation (rPEV, relatedness)^ Average relatedness^ 



Trait^ Average Min Max maxr aveS avelO ave20 aveSO maxr aveS 



WG 


0.47 


0.25 


0.61 


0.81 


0.83 


0.68 


0.52 


0.39 


0.35 


0.19 


Cw 


0.46 


0.25 


0.61 


0.81 


0.83 


0.68 


0.52 


0.39 


0.35 


0.19 


Pw 


0.46 


0.25 


0.61 


0.81 


0.83 


0.68 


0.52 


0.39 


0.35 


0.19 


Mw 


0.46 


0.25 


0.61 


0.81 


0.83 


0.68 


0.52 


0.39 


0.35 


0.19 


Nw 


0.46 


0.24 


0.61 


0.82 


0.82 


0.68 


0.52 


0.39 


0.35 


0.19 


PWG 


0.47 


0.27 


0.61 


0.72 


0.79 


0.66 


0.52 


0.42 


0.36 


0.20 


cy 


0.47 


0.25 


0.60 


0.71 


0.79 


0.65 


0.52 


0.43 


0.36 


0.20 


Py 


0.47 


0.25 


0.60 


0.72 


0.79 


0.65 


0.52 


0.43 


0.36 


0.20 


My 


0.46 


0.25 


0.60 


0.72 


0.79 


0.65 


0.51 


0.42 


0.35 


0.20 


Ny 


0.46 


0.25 


0.60 


0.72 


0.80 


0.66 


0.52 


0.43 


0.36 


0.20 


SCaw 


0.45 


0.23 


0.60 


0.71 


0.82 


0.69 


0.55 


0.44 


0.35 


0.20 


BW 


0.46 


0.25 


0.61 


0.83 


0.82 


0.67 


0.50 


0.38 


0.35 


0.19 


GL 


0.44 


0.22 


0.60 


0.84 


0.84 


0.70 


0.55 


0.48 


0.33 


0.18 


Wl 


0.46 


0.25 


0.61 


0.81 


0.83 


0.68 


0.52 


0.39 


0.35 


0.19 


Fl 


0.46 


0.25 


0.60 


0.78 


0.80 


0.65 


0.50 


0.39 


0.35 


0.20 



^Estimated theoretical DGV accuracy (rPEV) calculated based on diagonals of inverse of coefficient matrix in GBLUP20; ^association evaluated by the correlation 
between individual DGV accuracy and each measure of relatedness of testing set animals with training set, calculated either as the maximum relationship (maxr) 
or as the average of the top 5 (aveS), 10 (avelO), 20 (ave20) or 50 (aveSO) relationships between each testing animal and all training set animals; ^Averages of 
relatedness of testing set animals with training set, evaluated through maxr or aveS; *WG = weight gain from birth to weaning (about 205 days of age); Cw, Pw, 
Mw, Nw = visual scores taken at weaning for carcass conformation, finishing precocity, muscling and navel, respectively; PWG = weight gain from weaning to 
yearling (about 550 days of age); Cy, Py, My, Ny = visual scores taken at yearling for carcass conformation, finishing precocity, muscling and navel, respectively; 
SCaw = scrotal circumference adjusted for age and weight; BW= birth weight; GL = gestation length; Wl = weaning index, composed by traits evaluated at 
weaning; Fl = final index, composed by traits evaluated at weaning and yearling (Fl) [See Additional file 1 for more details]. 




■ GBLUPO ■GBLUP20 
BayesC ■ BLASSO 

■ expected ■ rPEV 




Figure 1 Comparison of empirical and estimated theoretical accuracies (rPEV) with their expectations for 15 traits* in Bos indicus 
(Nellore) cattle. Colored bars indicate: empirical accuracies calculated as the Pearson's correlation between deregressed proofs in 201 1 for the 
bulls in the testing set and their DGV, divided by the average accuracy of dEBV in the testing set (empirical accuracies were obtained using four 
methods of prediction: GBLUP20, GBLUPO, Bayes C and Bayesian LASSO); estimated accuracies (rPEV) were calculated by averaging the individual 
accuracies (obtained based on diagonal elements of the inverse of the coefficient matrix in GBLUP20) across all animals in the testing set; 
expected accuracies were calculated with the analytical formula proposed by Daetwyler et al. [32]; *WG = weight gain from birth to weaning 
(about 205 days of age); Cw, Pw, Mw, Nw = visual scores taken at weaning for carcass conformation, finishing precocity, muscling and navel, 
respectively; PWG = weight gain from weaning to yearling (about 550 days of age); Cy, Py, My, Ny = visual scores taken at yearling for carcass 
conformation, finishing precocity, muscling and navel, respectively; SCaw = scrotal circumference adjusted for age and weight; BW = birth weight; 
GL = gestation length; Wl = weaning index, composed of traits evaluated at weaning; Fl = final index, composed of traits evaluated at weaning 
and yearling (Fl) [See Additional file 1 for more details]. 
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Cy Py My Ny SCaw 

Figure 2 Empirical accuracies of genomic predictions for 15 traits*^ of Bos indicus (Nellore) cattle for different validation strategies^. 

""forward = training set composed of bulls with highly accurate EBV in 2007 and testing set composed of the remaining bulls (with accurate 
EBV in 201 1); RAND = 5-fold cross-validation (CV), splitting animals randomly into groups of similar size; DIST= 5-fold cross-validation, based on 
k-means clustering of animals based on their genomic distance (i.e. minimizing inter-groups relationships); empirical accuracies were calculated as 
the Pearson's correlation between DGV (obtained with GBLUP20) and deregressed EBV (dEBV) in 201 1 for the testing set, divided by the average 
accuracy of dEBV in the testing set; for the cross-validation strategies (RAND and DIST), the bars and errors bars represent the estimates of means 
and standard errors obtained in 5-fold CV, respectively; **WG = weight gain from birth to weaning (about 205 days of age); Cw, Pw, Mw, Nw = visual 
scores taken at weaning for carcass conformation, finishing precocity, muscling and navel, respectively; PWG = weight gain from weaning to yearling 
(about 550 days of age); Cy, Py, My, Ny = visual scores taken at yearling for carcass conformation, finishing precocity, muscling and navel, respectively; 
SCaw = scrotal circumference adjusted for age and weight; BW= birth weight; GL = gestation length; Wl = weaning index, composed of traits evaluated 
at weaning; Fl = final index, composed by traits evaluated at weaning and yearling (PI) [See Additional file 1 for more details]. 



The benefit of using variable selection methods is 
expected to be higher when the number of markers is 
much greater than the number of genotyped animals. 
Neither of the previous GS studies on real data con- 
tained such large differences between number of animals 
in the training set and number of genotyped SNPs, thus 
our study included a scenario for which the use of vari- 
able selection methods was expected to provide some 
benefit. Erbe et al. [12] also confirmed the advantage of 
a variable selection method (Bayes R) over GBLUP, 
after analyzing GS in dairy cattle using the same type of 
high-density panel as we used. These authors suggested 
that variable selection methods must be used to take full 
advantage of the increased marker density. The larger 
empirical accuracies that we obtained with BayesC and 
BLASSO here corroborate this hypothesis. 

Moreover, the considerably greater empirical accur- 
acies that we obtained with BayesC and BLASSO for 
some of the traits may also suggest the segregation of 
genes of larger effect for these traits. For instance, a 
recent GWAS (genome-wide association study) for birth 
weight, that used data from this same Bos indicus popu- 
lation, provided evidence that a region on chromosome 
14 had an important effect on this trait [41]; this region 
had previously been shown to be associated to body size 
in taurine cattle [42]. 

The simulation study in [32] provided evidence that 
the relative advantage of variable selection methods over 
GBLUP depends on the number of QTL (Nqtl) 



underlying the trait. When Nqtl is greater than the 
effective number of chromosome segments, GBLUP 
should perform equally or better than variable selection 
methods. In the present study, weight gain from weaning 
to yearling was the only trait for which a clear advantage 
in terms of empirical accuracy was observed for GBLUP. 

Obtaining individual accuracies and reduced computa- 
tion time are potential advantages of GBLUP over Bayesian 
regression methods. The computation time necessary to 
process all 15 traits took less than one minute with GBLUP, 
while about two days were required for each of the two 
Bayesian regression methods (data not shown). The reason 
for such large differences in computing time is partially 
due to the fact that the number of genotyped animals is 
much smaller than the number of markers, and these 
differences are expected to decline as the number of geno- 
typed animals increases. 

Use of a combined relationship matrix in GBLUP 

For most traits, slightly greater empirical accuracy was 
achieved with GBLUP20 than with GBLUPO, which con- 
firms the results reported by [21]. However, conversely 
to what these authors indicated, the use of GBLUP20 
showed no clear evidence of improvement in the scale of 
the DGV. This, and the fact that GBLUPO was slightly 
more accurate than GBLUP20 for some traits, suggests 
that the optimal weight (w) for pedigree-based relation- 
ships in this alternative implementation of GBLUP may 
be trait-specific, as pointed out by [43]. 
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Accuracies of genomic predictions 

In the present study, both empirical and estimated ac- 
curacies of genomic predictions matched their expecta- 
tions relatively well but some noteworthy deviances 
were found. The relatively small number of animals ana- 
lyzed (n = 685) restricted the formation of training and 
testing sets to small numbers of individuals, which led to 
some degree of variation due to sampling, especially 
when correlations were calculated. 

Lower than expected empirical accuracies were esti- 
mated for some traits, notably for conformation at 
weaning and navel. This could be explained by inad- 
equacy of the model used for SNP effect estimation, for 
instance if the marker density was not sufficiently high 
to track all genetic variation associated with these traits. 
In addition, for all traits, expected accuracies were based 
on the assumption that markers explain 80% of the gen- 
etic variance, and the adequacy of this assumption may 
be trait-specific. Another potential source of noise is 
related to the fact that the response variables used in 
model training are prone to prediction errors, the extent 
of which can also vary across traits. 

Further analyses suggested that empirical accuracies 
greater than expected could be caused by population 
stratification. As already reported by [41] for this same 
population, two breeding subgroups were observed in a 
principal component analysis based on genomic rela- 
tionships [see Additional file 3]. Other investigations 
have also shown higher-than-expected accuracies for 
traits for which EBV means differed significantly be- 
tween these subgroups (data not shown). 

The existence of these subgroups is consistent with 
two different artificial selection criteria that were applied 
in this population in the past 20 years [41]. The major 
differences between such selection criteria consist of 
largely different emphases on the traits for which we ob- 
served higher-than-expected accuracies in the present 
study (i.e. carcass finishing precocity at yearling, musc- 
ling at yearling and scrotal circumference). 

The results of this study seemed to confirm the associ- 
ation reported by [31] between the accuracy of individ- 
ual DGV and the relatedness of testing and training 
animals, although the strength of this association was 
lower in the present study. The authors of [31] found 
that the average of the top 10 relationships with training 
animals (avelO) was a better predictor of estimated 
accuracies than the maximum relationship, while the 
opposite was observed in this study. 

For some traits, the average of the estimated individual 
accuracies was consistent with the empirical accuracies, 
while this did not hold for other traits. Clark et al. [31] 
also showed that, while estimated and empirical accur- 
acies agreed well for simulated data and for eye muscle 
depth in Merino sheep, larger differences between these 



two sets of accuracies were found for live weight in the 
same population. 

Reasonable evidence for an association between the 
relationship of the animals in the testing set and the 
training set and the accuracies of DGV was found, which 
confirmed the report by [38]. The cross-validation strat- 
egies applied in the present study (RAND and DIST) 
indicated that the same association also holds for empir- 
ical accuracies. In this context, when comparing RAND 
and DIST, we observed that empirical accuracies nearly 
halved with a 2-fold decrease in average relatedness be- 
tween testing and training set animals. A consequence of 
this observation is the possibility of evaluating to which 
extent the relationship between selection candidates and 
training animals would affect the accuracy of genomic 
predictions. Based on the estimates of relationships cal- 
culated according to top5 and maxr under the forward 
prediction scheme, it is expected that the accuracy of 
DGV prediction will not differ much from values re- 
ported here, given that the sire or a few half-sibs are in- 
cluded in the training set for most selection candidates 
in this population. Thus, application of GS in this popu- 
lation requires a dynamic training set, because recurrent 
inclusion of new sires in the training population is 
necessary to enhance predictions of the genetic merit of 
young animals [9]. 

Scale of genomic predictions 

Although our study mainly focused on the accuracies of 
genomic predictions, depending on the selection scheme, 
the scale of predictions should be a matter of concern, 
especially to determine whether DGV can be compared to 
traditional EBV from routine evaluations. For example, in 
situations in which both progeny-tested and newborn ani- 
mals are selection candidates, an artificial overestimation 
of the genetic trend would lead to undue exaggeration of 
DGV over traditional EBV, as discussed by [44]. 

Although Bayes C and BLASSO were more accurate 
than GBLUP for most traits in the present study, these 
Bayesian regression methods tended to generate de- 
flated predictions. Previous studies have found large 
differences in the scale of genomic predictions ob- 
tained using Bayesian regression. Some of these studies 
do not agree with the trend of deflation we observed 
here (e.g. [12,45]), while other methods similar to 
BayesC and BLASSO also resulted in deflated predic- 
tions for some traits analyzed [9,11,46]. This variation 
in scale may be related to differences inherent to the data 
analyzed (e.g. the extent to which training animals were 
pre-selected) and to differences in the implementation of 
the methods. Future studies should investigate whether 
including a residual polygenic effect in these Bayesian 
regression models could improve the scale of genomic 
predictions, as suggested by [45]. 
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Future work 

Because the selection candidates in this population have 
own performance data recorded before selection deci- 
sions take place, the accuracy of traditional EBV based 
on own performance could be a suitable reference to 
evaluate the gain in accuracy that can be attributed to 
GS. Unfortunately, this information was only available 
for a small subset of the testing animals in our study, 
due to the fact that a considerable proportion of the ani- 
mals were born and had own performance data recorded 
within other breeding programs, although they had 
enough progeny recorded in the dataset available for this 
study to obtain accurate EBV in 2011. A proper com- 
parison between empirical accuracies of traditional EBV 
and DGV should be carried out as soon as more infor- 
mation is available. In addition, DGV accuracies are 
expected to increase when more animals are genotyped. 

Another topic that deserves further investigation is 
the identification of an optimal marker density for gen- 
omic prediction in the population analyzed. Theoretic- 
ally, a higher marker density is expected to increase the 
accuracy of genomic predictions, due to stronger LD 
between markers and QTL [47]. Previous studies that 
compared genomic predictions obtained with high- 
density (-777 000 markers, HD) and medium- density 
panels (-54 000 markers, 54 k) in Bos taurus breeds re- 
ported only a marginal increase in accuracies when using 
high-density panels [12,47], Because the size of the refer- 
ence population in this study is small, the possible benefits 
of an increased marker density could be counterbalanced 
by an increase in the number of unknown parameters to 
be estimated, as previously suggested by [47]. For instance, 
in a Jersey population, when genomic predictions were ob- 
tained with a training set of size comparable to that of the 
present study, the accuracy of the DGV decreased slightly 
when moving from 54 k to HD [12]. The relative benefit 
of genomic predictions obtained at different marker dens- 
ities will be evaluated when more information is available. 

While these initial results seem to confirm the tech- 
nical feasibility of applying genomic selection in a Bos 
indicus (Nellore) beef cattle population, further work is 
needed on the design of breeding schemes for this 
particular breed. In this context, imputation methods 
will probably play an important role to improve cost- 
effectiveness of this technology, as suggested by [48]. 

Conclusions 

The technical feasibility of applying genomic prediction 
in a Bos indicus (Nellore) population was demonstrated, 
although further research on its implementation in breed- 
ing schemes is necessary to enable more cost-effective 
selection decisions using genomic information. Bayesian 
regression models (Bayes C and BLASSO) were more ac- 
curate than GBLUP for most traits and are of interest for 



future applications of genomic selection in this popula- 
tion, but further improvements are needed to reduce 
deflation of the predictions obtained with such methods. 
The accuracies of genomic predictions depended on the 
extent of relatedness between training and testing set ani- 
mals, which means that recurrent updates of the training 
population are necessary to enhance predictions of the 
genetic merit of young animals. 
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